AITopics | serbian language

Collaborating Authors

serbian language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Named entity recognition for Serbian legal documents: Design, methodology and dataset development

Kalušev, Vladimir, Brkljač, Branko

arXiv.org Artificial IntelligenceFeb-14-2025

Recent advancements in the field of natural language processing (NLP) and especially large language models (LLMs) and their numerous applications have brought research attention to design of different document processing tools and enhancements in the process of document archiving, search and retrieval. Domain of official, legal documents is especially interesting due to vast amount of data generated on the daily basis, as well as the significant community of interested practitioners (lawyers, law offices, administrative workers, state institutions and citizens). Providing efficient ways for automation of everyday work involving legal documents is therefore expected to have significant impact in different fields. In this work we present one LLM based solution for Named Entity Recognition (NER) in the case of legal documents written in Serbian language. It leverages on the pre-trained bidirectional encoder representations from transformers (BERT), which had been carefully adapted to the specific task of identifying and classifying specific data points from textual content. Besides novel dataset development for Serbian language (involving public court rulings), presented system design and applied methodology, the paper also discusses achieved performance metrics and their implications for objective assessment of the proposed solution. Performed cross-validation tests on the created manually labeled dataset with mean $F_1$ score of 0.96 and additional results on the examples of intentionally modified text inputs confirm applicability of the proposed system design and robustness of the developed NER solution.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.10582

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.05)
Europe > Spain (0.04)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text vectorization via transformer-based language models and n-gram perplexities

Škorić, Mihailo

arXiv.org Artificial IntelligenceJul-18-2023

As the probability (and thus perplexity) of a text is calculated based on the product of the probabilities of individual tokens, it may happen that one unlikely token significantly reduces the probability (i.e., increase the perplexity) of some otherwise highly probable input, while potentially representing a simple typographical error. Also, given that perplexity is a scalar value that refers to the entire input, information about the probability distribution within it is lost in the calculation (a relatively good text that has one unlikely token and another text in which each token is equally likely they can have the same perplexity value), especially for longer texts. As an alternative to scalar perplexity this research proposes a simple algorithm used to calculate vector values based on n-gram perplexities within the input. Such representations consider the previously mentioned aspects, and instead of a unique value, the relative perplexity of each text token is calculated, and these values are combined into a single vector representing the input.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.09255

Country:

Europe > Serbia > Central Serbia > Belgrade (0.05)
Europe > United Kingdom (0.04)
Europe > Austria > Vienna (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

#artificialintelligenceApr-12-2023, 14:20:52 GMT

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks.

natural language processing, resource and method, serbian language, (2 more...)

#artificialintelligence

Genre: Overview (0.99)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

Marovac, Ulfeta A., Avdić, Aldina R., Milošević, Nikola Lj.

arXiv.org Artificial IntelligenceApr-11-2023

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.05468

Country:

Europe > Serbia > Central Serbia > Belgrade (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Europe > Serbia > Šumadija and Western Serbia > Raška District > Novi Pazar (0.04)
(22 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Materials > Metals & Mining (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Machine Learning with ML.NET - NLP with BERT

#artificialintelligenceJul-7-2021, 12:14:35 GMT

During the training, process Encoder is supplied with word embeddings from the English language. Computers don't understand words, they understand numbers and matrixes (set of numbers). That is why we convert words into some vector space, meaning we assign certain vectors (map them to some latent vector space) to each word in the language. There are many available word embeddings like Word2Vec. However, the position of the word in the sentence is also important for the context.

information, machine learning, serbian language, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Creating a contemporary corpus of similes in Serbian by using natural language processing

Milosevic, Nikola, Nenadic, Goran

arXiv.org Artificial IntelligenceNov-22-2018

Simile is a figure of speech that compares two things through the use of connection words, but where comparison is not intended to be taken literally. They are often used in everyday communication, but they are also a part of linguistic cultural heritage. In this paper we present a methodology for semi-automated collection of similes from the World Wide Web using text mining and machine learning techniques. We expanded an existing corpus by collecting 442 similes from the internet and adding them to the existing corpus collected by Vuk Stefanovic Karadzic that contained 333 similes. We, also, introduce crowdsourcing to the collection of figures of speech, which helped us to build corpus containing 787 unique similes.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1811.10422

Country: Europe > Serbia (0.47)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.48)

Add feedback